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1. INTRODUCTION 


In the literature there is considerable interest in the incorporation of 
contextual information into classification, especially in the development of 
methods for character recognition. Generally, one of two basic approaches 
has been followed, the table look-up method or the Markov approach. The 
table look-up method is based on the assumption that every word in the text 
is selected from a known finite table. A word of text is classified by com- 
paring it with every word in the table having the same length and finding 
the best match. 

The Markov approach is based on the assumption that the true category of a 
character is related in a probabilistic manner to the true categories of a 
small number of surrounding characters. Its use leads to the estimation of 
the probabilities of all possible pairs, triples, etc., of characters; i.e., 
transition probabilities from sample text. 

Abend (ref. 1) derived optimal procedures when a Markov dependence exists 
between the states of nature, and Raviv (ref. 2) gives the results of applying 
such procedures for the recognition of English text. Chow (ref. 3), using a 
nearest-neighbor dependence method, obtained the structure and parameters 
of a recognition network for patterns represented by binary matrices. 

Use of contextual analysis in speech recognition is considered by Alter 
(ref. 4). Welch and Salter (re*. 5) present an algorithm for the incorpora- 
tion of contextual information in the classification of picture elements 
(pixels) in an image. Ch.ttineni (ref. 6) discusses the use of context with 
the linear classifiers. 

All of these approaches assume the availability of, or estimation from a 
sample, the transition probabilities. In the application of the above tech- 
niques for classification of imagery data such as that obtained in remote 
sensing, it is difficult to estimate the transition probabilities; and very 
often they vary from one image to the other. 
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This paper presents a simple model for the transition probabilities In terms 
of a single parameter 0 and presents methoifs for the Incorporation of contex- 
tual Information Into classification In terms of 0. Techniques for locally 
estimating 0, based on classifier decisions and using a maximum likelihood 
method, are developed. The paper Is organized as follows. 

Section 2 presents a model for the transition probabilities. Section 3 dis- 
cusses the Incorporation of context Into classification. Section 4 develops 
techniques for locally estimating the parameter of transition probability 
model using the maximum likelihood method. Section 5 presents conclusions. 
Appendix A develops some results for estimating the parameter of transition 
probabilities under the assumption of different transition probability models 
for horizontal and vertical neighbors. Appendix B presents a multitemporal 
Interpretation of the techniques developed In the paper for remote sensing 
applications by minimizing the registration errors and Incorporating ..ontext 
Into classification. 

2. A MODEL FOR REPRESENTING TRANSITION PROBABILITIES 


Let 1 and j be the neighboring picture elements (pixels) with pattern vectors 
X. and X. and class labels u. and oj. respectively. Let oj. and w. take 

* J ^ J * J 

values r and s. Let P(u) = r) be the a priori probability of class r. If It 

is assumed that the labels of pixels 1 and j are independent, then 


and 



r|(*)j = s) = P(u). 



P{o). = r|u)j = r) = P(u. 



( 1 ) 


Similarly, if it Is assumed that the labels of the pixels 1 and j are com- 
pletely dependent. 


and 


P(o). = r|u)j = s) = 0 
P(o). = rlwj = r) = 1 


( 2 ) 
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Because generally a dependence exists between neighboring pixels, this 
dependency is modeled through a parameter 0, which lies between 0 and 1 as 

P(u)^ » r|u)j - s) • (1 - 0)P(u)^ » r) j 

and / (3) 

P((i)^ » r|u)j * r) * (1 - 0)P(u)^ ■ r) + 0 / 

where 

0 < 0 < 1 (4) 

From equations (1), (2), and (3), it is easily seen that, 0 ■ 1 denotes 
complete dependence and that 0*0 denotes independence. 

The following shows that this definition satisfies the postulates of prob- 
ability. Let there be M classes. Consider that 

M M 

P(w. * r|u). = s) * P((D. = s|u). = s) + P(u). * r|o). * s) 

r=i ^ J 1 J 1 J 

= [(1 - 0)P(wi = s) + oj + E (1 - 0)P(o). = r) 

= 1 - 0 + 0 = 1 

thus satisfying the probability rule. However, it is to be noted that the 
dependencies between the neighboring pixels can be modeled through some other 
parameter; for example, by replacing 0 = t-t— » then the dependencies depend 

G-e “ 

on a, 0 < a < “, by replacing 0 = r-, then the dependencies depend 

1 - e'® 

on 8, 1 6 £ “. This paper assumes that the spatial dependencies are 

modeled according to equations (3) and (4). 

3. CONTEXTUAL CLASSIFIERS 

Using the transition probabilities model of the previous section, this section 
develops methods for incorporating contextual information into the classifier 
decision process. 
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3.1 SPATIALLY UNIFORM CONTEXT 


It is assumed that 0 holds hood for transition probabilities representation 
in the neighborhood under consideration. Consider a neighborhood of pixels 
shown in the following figure. 



Figure 1.— Four neighbors of pixel 0. 


The pattern vectors and class labels of these pixels are denoted by , 

i =0, 1,2, •••, 4. Suppose that pixel 0 is under consideration. Pixel 0 
is classified into class i^ on the basis of a I'oi^terior:. probabilities 
p ( (J q ■" ^qI^q’ ^ 1 ’ ***’ Let f ~ P(Xq< ***» Consider the 

following: 

p(. ... . ‘■‘"o ' ’o- "o- ‘ 4 ’ 

Pi-o ' 0 ’o- *!• • ■!> pT»o. 

. '"•o/Jo- V ‘r •••• *4' 

’)•' ’?•' ' 4 ' 

^ ^ ^ "'"o- •••• '4 0 • 'o- I 'r ••••^ 4 ’4>'’<“0 • 'o- -1 ’ '!• “4 • ‘4> 

l,-l 1,‘l I 

(5) 
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Making an assumption that the probability density function of a pattern given 
its label is independent of other labels and patterns, one can write the 
following: 


• ■ ‘"*0 0 ■ '#• ' ■ 'l- ■■■• 4 ■ '4' ‘r ■■■• ‘4'*l*|- *4 •* • 'e- -1 ■ 'f •••• -4 • '4' 


' 0 • 'O'- ‘I- 


■4 t '«• -I M- 


'll 


II'" 


J ) 


( 6 ) 


Now consider 


0- 


• •••• 4 • ' 4-0 • V 

'l-O* 

'ol'h • 

•0 ■ <0- ■? • 'r •••. 4 • ’4>'’S • 'r “4 • S't) • 'o> 


lo)P(., . 

*0 ‘ ■ '? -0 ' 'o- *3 ■ 'j- ‘4 • '4>'’('‘J • '3- 4 ■ ' 4 '' 


’o> n ■’<-j 

• • V 


(7) 


Note that in equation (7), it was assumed that the labels of the pixels are 
independent of the labels of the nonneighboring pixels. From equations (5), 
(6), and (7), the following is obtained. 


M 4 


P<10 ■ ‘o'*0- *l* — • *4> • £ li ■ ‘jl“0 • 'c 

ij»l i^*l j*1 


• <o’P‘*ol“0 • ‘0’ r 

• 7I7T 



J-1 1 


• ‘o>] 


(«) 


However, the denominator of equation (8) can be written as 
P(Xq, X^, ••., X^) = p(X,, •••, X^)p(Xq|X^, •••, X^) 

M 

= p(X^, •••, X^) ^ P(Xq, Wq = iol^T ^4) 


M 

~ » ***t X^) P(Xq|u)q = ig)p(ujQ = ***' ^ 4 ^ 

’0=1 


5 



Now consider 


« « 


'‘ c ■ 'o '(• ■■■• ‘4' ^ ) '')• I ■ l‘ 4 ■ '4 ')• '4’ 


V' 

»f«l. ■ 

'4 ' 

..1 1 • 
•l-' 

*4 ' 

'4 0 ■ 'o* 1 

• •••• 4 • ’4'^l 

»r.,. . 

... t • 

•• i: • •». 

'4 ' 

N-, , 

' 0 • ’o' 

rf«,. . 


fii:-- 

, . .,)f( , • . 

1 0 ‘ ’o' 


Hence, the following is obtained: 


»<«g. .... i,» . ^ p(»jiwp • . ij) PI . iji-p . »p) (9) 

Ifl*! J-I «j*l 

Conside • 

M M 

• 'jl “0 • 'o’ • 53 • 'o’ * • 'o”’‘“j • 'oK • 'o’ 


'j*' 


• <1 - O’ ^ p(«j|uj • • tj) • Op(«j|wj • *p) 

'j” 

Using equations (9) and (10) in equation (8), one gets 


(10) 


4 

H 

"s • 'o’"<*oi“o • 'o’ n 

(1 - 0’ ^ p(lj|»j • lj)P(«j ' tj) ♦ op(*jl«j • *0’ 

j*r 

.. _ 

X ■ - - 1 

's • ’o’l'^os • 'o’ r 

[ *' ■ ®’ £ ■ 'j’ * " ' 

»o"' j* 





( 11 ) 


Equation (11) can be used to update the posteriori probabilities of the 
classes of pixel 0 with the incorporation of contextual information. For 
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classification of pixel 0, the decision rule becomes the following: decide 

XqLWq ■ 1q, which maximizes , where 


p(u). 


ri 

J"1 


p(u). 

(1 - 0) . 0^ 


IqI 


J 


TiJ 


3 . 2 SEQUENTIAL OR MARKOVIAN DEPENDENCE 

This section considers the sequential or Markovian dependence between 
neighboring pixels with the transition probabilities as described In sec 
tion 2 In terms of parameter 0. This sequential model can be used to 
classify a pixel using contextual Information as follows. Consider a 
3x3 neighborhood of pixel 0, shown In figure 2. 


t 1 

1 — ^ 



00 

. 1 

o 

.0 

\> 


. 5 






I I — • I 

II I I 

Figure 2.- Illustration of 3x3 neighborhood. 


Consider pixel 1. Its posteriori probabilities are updated using the infor- 
mation from patterns of pixels 8 and 2 and similarly for pixels 7, 0, 3 and 
6, 5, 4. Finally, the posteriori probabilities of the labels of the pattern 
of pixel 0 are obtained using the ones of pixels 1 and 5. Now consider the 
sequential neighborhood shown in figure 3. 


Xi , 

x^. 

X3, 

• • • 

^n-l ’ 


^n+1* 

• « • 


Ui 

U)^ 

W3 


‘^n-l 

“’n 

"^n+1 




Figure 3.- Illustration of a sequential neighborhood. 
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Assume that the posteriori probabilities 

1 ■ 1, 2, •••, M are known. Then the problem Is to update the posteriori 
probabilities of using the Information from X^, , and X^ , X^_^. 

The following assumptions are made. Given the Identification of the label of 
the n-lt^ pixel, the label of the nth pixel does not depend on the patterns 
of pixels 1, 2, •••, n - 1. That Is, 

P(un * '^l‘*’n-l * ***’ ^n-1^ “ ^^“n * '^l“n-l " 


It Is also assumed that given the Identification of the label of a pattern. 
Its density does not depend on any other Information. That Is, 

p(Xn|ti)n * k. any other X or u)) ■ p(X^|uj^ ■ k) (13) 


With these assumptions, the contextual relations using sequential or 
Markovian dependence are developed. Now consider 




•• >«.i> VI • ’"i* — • vi> 

1*1 

n 

. i. .... x„.,)p(vi • ‘l»,. Vl> 


I-1 

M 


•r 


‘iv 

1 ’ ’’"‘vi ■ 

' l|X,. .... 

*p-l' 







n 

•L 

1*1 

^n-l 

• k) 

^<“n-l ■ ’•% 

• "'pivi 

• <l«r •••• Vi> 

n 

P(u 

• k) 




“Z 

4. 1 

ft 

’’K-i 

"nr 

’’K-l ’ ’•"n 


• -. Vi> 


H 

P(. . k) 

* ' , ' - T; ■ •‘l“n * Vl* 

ft* I 

^(W|| ■ k) 

VT . V- ~n • ■‘'•’'vi • Vi> 

i.i 

Pk 

♦ • ®>^<vi • '■> ‘ 3]p(w„, • ki«,. .... x„.,) 

ft* I ^ * 


H 

{1 - o)p(w„ . k) ^ p(.„., . ilx,. .... x„.,) . e P(Vi • M*, 

i.i 


P(»i„ • k) 


P(w, • k) 

(1 - 0)P(u.„ ■ k) . 0 . n P(u.„., • k|X,. .... X^.,) 


Vl> 


(14) 
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Using similar arguments, one can write the following: 


M 


P(x„|x,. .... X„.,) ‘2^ p(x„. . t|x,, .... x„_,) 


i-1 

M 


^ P(Xju)n • i. Xp .... X^_^)p(u,^ - 1|X^. .... 


i-1 

M 


’ 'l*r V 

i-1 


(15) 


Now the posteriori probabilities of the label of pattern X„ are updated using 
the information frc:.. pattern X^ and patterns X^ . X 2 , .... X^_^ as 


»•<“« * kl*i' 


. k. .... * I ) 
S-l* n' *TS — ■ - V ^ =- 


p(*Jw„ . k. I,. .... . k|l,. .... I )p(* .... I ) 

■ ‘>pK • ‘1*1. *n-l) 

H 

■ *>'»(“« * 'l*l« •••• *„-l) 

fl 


"K * ‘l*n»PK • M*l. Vl> 


pK • • MX,. .... I 

(•1 " 


. ’ • ‘i*n> : e CpK • ‘l«n>P<-n.| ‘ ‘l*r *n.i >/PK.i * O] 


2 }(1 - 0)p(u.„ . I|X„) ♦ 0[p(.i^ . M*„)p(Vl • MX,. .... X„.,)/p{y.^., ■ 1)^ 


(16) 
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Th»j Information in patterns X^, in obtaining the label of pattern 


^n+1 written as follows: 


*<%.! • Ji'i- •••• *«> • S »‘-Mi • •• % • »i«,. — . «,) 

1*1 

H 

• 2^ 'I-.KI • ^l“i. • *•» * '<%•! • •«* 


!•! 


(1 . ■ J) ^ »(«^ • t|«|. '"t «,) ♦ a»(«^ • Jliji •••. *,l 


cl 


(t • o)r(i^, • j) • e»(^ • 111,, •••, i^> 


(17) 


Consider 


M 


XJ (18) 


P(X„„|X,. •••, x„) ■ J^P(X„„. ■ J|X,. •••, x„) 

J'l 

M 

■ ' Jl*r 

J.-1 

Using the patterns X-j , X2, X^, X^^^ , one has 

H 

»(-, • HI,. — . V !„,) • 2^ P(> • H -V.1 • — • *n- *».1> 

J*' 

H 

•lI'-iM * '• “« * ‘* *l’ • *• *!• *«*'*‘'» ■ '!• ‘n’ 

. i,->. pfV,. I„. ...7T,T 


* ^l“i« ■ ‘••’•“n * ‘'•f •«> 

The numerator of equation (19) can be written as follows: 

H 

5Z ‘’'*n*|l“«.| • * Jl“r. • • ‘l‘l> ■••• • ‘l“« • ‘loK ■ ‘‘I‘C 


(19) 


•• »■> 


)•' 

Ik 


(1 - o)p(u, • M»,. «„) 2 ] P(«„.,l<»„„ • • J) ♦ *»(«„., 1 -,., • Hp(»„ ■ H*, 


«.) 


(20) 
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Now froiK equations (17), (18), (19), and (20), one obtains 


p(-, . k I,. 


(t - i)p(-, ■ » I,. • 1 > ‘ '•“miI-mi • • ‘I'r 

. ... . -I*' 


•• S' 


n - )pU„ • *„* * 9 - . ly- ^ *r *o*| 

(' - » ♦ ^ * ^'*1- ■■•• ‘"*j 


( 21 ) 


From equations (16) and (21), one obtains the desired result. 


3.3 UNSUPERVISED MAXIMUM LIKELIHOOD PARAMETER ESTIMATION 

One of the methods of unsupervised learning or clustering is to assume the 
component densities of the mixture density as normal with unknown means 
and covariance matrices and to draw samples independently from the mixture 
density and estimate the parameters of the mixture density using maximum 
likelihood technique. Let -jr » {X.|, •••, X^} be a set of n unlabeled samples 
that are drawn independently from the mixture density: 

M 

p(Xla) » ^2 * j* * j) 

j-1 

where a is a veoior of parameters of the mixture density and a. is a vector 
of parameters of the ^th component density. The likelihood of the observed 
samples is, by definition, the joint density, 

n 

P(*|a) = |~| p(X|^|a) 
k=l 

If p(X|o) = i) are assumed to be multivariate normal with the means and 
covariance matrices Z . , the equations for the local maximum likelihood 

^ /N 1 A 

estimates u^, Z . , and P(o) = i) under the constraints of 0 £ P(o) = 1) £ 1 and 
M .. ’ ’ 

P(w = i) = 1 are given by the following (ref. 7). 
i = l 


n 



p(u) « 1) • P(0) « i|X^, a^) 

k«l 

n 

^PU. i|x,. 

• -n 

P(w “ i |X|^, ) 

k-1 


n 



y~^ P(w = 1 |X|^, a. ) 


k»l 

where 

p(Xju) = i, a.)P(u) = i) 
P(u) - i|X^, a.) = 

y^ P(X|^|u) = j, aj)P(u) = j) 
j = l 


liJif' - »‘i)^ ^j^X^ - Pi)]p(ix> = i) 

M 7 ^ 

^ I 4' ‘ ®’‘p[4^’'k • ‘V’ 

j = l ^ 


In the application of this technique to the clustering of images in the 
spectral domain, the parameters are updated after a spl i t-and-merge sequence. 
Updating the parameters involves the computation of the posteriors. The con- 
textual algorithms presented in sections 3.1 and 3.2 can be used for updating 
the posteriors, with the estimates of transition probabilities from the local 
neighborhood using the techniques developed in sections 4.1 and 4.2. 
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4. LOCAL NEIGHBORHOOD MAXIMUM LIKELIHOOD ESTIMATION OF PARAMETER 0 


This section derives an expression for the likelihood function of patterns 
^0* ^1’ ^2’ ^3’ ^4 parameter 0 and obtains 0 by maximizing this 

function. 


4.1 NEIGHBORHOOD OF FOUR 

Consider a local neighborhood of pixel 0, Illustrated In figure 1. The 
likelihood function of Xq, X^ , •••, X^ given 0 can be written as follows: 


I,. *•*, t 


» L E - ^ "'o- -0 • 'o- 1 • 'l- •••• ’4- • • 'l 

■ V' V' 

a M H 

E E ■■■ E ‘"‘o' '!• ■■■• ** -0 ■ 'o‘ ■■■• * ' '»• ■ 'o' 1 ' 'l' ■■■' -* " '4 ■’ 


•I I,. I 1,-1 


( 22 ) 


Consider 


I.. 


■4 0 '0' 1 


l"‘n -0 ■ 'o'0'’l *J' *4' t) ■ 'O' 1 ■ 'l' ■■■' '4 ■ ' 4 ' lO"?' ‘4 -0 ■ 'O' ■■■' *4 ■ '4' * 


I I"'*! J ' 


(23) 


To obtain equation (23), it was assumed that the probability density of a 
pattern, given Its label, is independent of any other Infomation. Consider 


0 • 'O' 1 • '1' ■ 

4 • I4 ) • »(.o • 

Ifl )!>(., • 1,. .4 • <41-0 • Ifl- :) 




V'-l • '10 * 'O' -7 • V *4 • ' 4 ' • * 7 ' •• 

•' -4 • '4 *0 • 'O' > 


• ‘■(*B ■ 

'o> I !'■<'.( • 'ji-o- 'o' 

(24) 



J*' 



The derivation of equation (24) Is based on the assumption that the label of 
a pattern is Indpendent of the labels of the nonneighboring patterns. From 
equations (22), (23), and (24), the following Is obtained: 


p(*o- *!• *4le) * X) 5!^ 

<o*’ U’’ 

M 

* ’’S ■ ^0>P<*ol'"0 * * 0 ’ ZZ 


• * 

PS • 

<0> 

■ 4 

[~|p(uj ■ IjS - «o.G) 

Lj'O j 



-j*’ 


‘o-> 


i,.i VI 


PI p(Xj|u) • ij)P(uj ■ ijluj, • 1 q. e) 


(25) 
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Interchanging the product and the sunmatlons results In the following 
expression: 


P(X(,. X,. .... X^lo) . pUq . * ’o> n 


‘o*' 


p(Xj|-j • ♦jjPU, 'jU'o‘‘o. > 


‘j-’ 


( 26 ) 


Since indeoendent of 0, dividing both sides of the above equa- 

j»0 

tion by It and noting that the a \viori probab' * i ties are Independent of 
pattern location, one obtains 


p{x 


J. X|, .... x^|o) 1*^ 

V • L •’<“■0 • ’o'*o’ 111 

f]p(Xj) ’o*’ (’■'! 


• • , I X ) 

'o- '» 

.1 J J 


Let 


P(- • <3lX(,) II 


‘o” 


E p(w • »,|X J 

p(;7i-r:r • 'j'“ * ‘o- 


i') 




P(- • ‘nlX.I 

* T»r.. V * ’qI" • 'o* 


M I 4 

Y * ‘o'*o> n 

•o*' (j-i 

H 4 

Y • 'oi*o>n 

‘o’’ (->•' 


E p(u • 1,,|X ) 


P( ' • 


(27) 


p(X., X,. •••, XJ0) 

L(0) = ----- 4 - 

f]p(Xj) 

j=0 


(28) 


Expanding equation (27) yields 


L(0) = (1 - 0)^ + 0(1 - 0)^A + 0^(1 - 0)^B + 0^(1 - 0)C + 0^D (29) 
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where 


•o-' 



*0<V 







♦ P(- • <o'*«'l 


►r- ■ 

«ol 

•ol‘i 

> • 


p(w • 

•o'»J» 












r*(- 

V u 
• *0* 

|r(.< • 

•ol'i 


* *o'*/ * ••*“ 

• 'o" 

,)p(- ■ 








• P<- • 

)p(- • 

'o**j' 

• p(m • tpl»^)p(« • 

1 ♦ PU 











r*(" • 

•o' 


•o'*t 

)p('< 

• 'ol»/(P(- • 


• p(- 

• * 

•o'*.' 


• 

P(m • t 


• '0l»jlp(i^ • 1 

'o'*«' 

• p(- • 

' ’o'*.’'"*"’ * * • 

o'S'l 

m(k • 

IqI 










• *0> 

[p(w • 

*ol‘t 

)pl» 

1pU^)p(w • 

'(jl*j)pt« • < 



To nwximize L(0), the derivative of L(0) is taken with respect to 0, and 
the resulting expression is equated to zero. This results in 

+ CO ♦ d = 0 (30) 

tH' 

where 

d = 4 . 4A ♦ 4B - 4C + 4D. b = -12 + 9A - 6.i + 3C 

c = 12 - 6A ♦ 2B, d = -4 + A 

Equation (30) is a cubic equation with real coefficients; hence, it will 
have either three real roots or one real root and two complex roots. With 
a change of variable (ref. 8), 

Z = aO + b (31) 

one obtains from equation (30) 

+ 3HZ ♦ G = 0 (32) 

where 

H = ac - b^ ) 

I (33) 

G = a^d - 3abc + 2c^ ) 
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Let 



The L(0) is a continuous function in 0. The 0 in the range 0 £0 £ 1, which 
maximizes L(0), can be found using figure 4 (ref. 9). 
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Figure 4.- Procedure for finding in the range 0 < 0 £ 1, 

which gives the global maximum for L(0). 








The of this section can be used with the spatially uniform context 
algorithm presented in section 3.1. 


4 . 2 NE IGHBORHOUp OF TWO 


Consider a local neighborhood of pixel 0, illustrated in figure 5. 


*1 

^0 

*2 

1 

0 

2 


Figure 5.— Two neighbors of pixel 0. 


Let 


L(0) 


P(Xq, x^, x^Ig) 

fl p(*j> 

j»0 


(37) 


Using similar arguments as in section 4.1, one obtains an expression for 
L(0) as follows: 


L(0) = (1 - 0)^ + 0(1 - 0)A + G^r. 


where 


and 



p(u) = 

PTi^ 


ioI^O^ 


V 


[p(u) 


iplxp + p((i) 


^0 1 ^2 


(38) 



P(u. = i„|X„) 

__ [p(jj 

p (ui = 1 q) 


iol’^l)P(“ 


^ 0 1 ^2 ^ ^ 


Taking the derivative of L(G) with respect to 0 and equating the resulting 
expression to 0 yields 

= 2(0 - 1) + (1 - 20)A + 20B = 0 (39) 
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( 40 ) 


The root 0^ of equation (39) is given by 

®i ’ 

Because L(0) is a continuous function of 0, the optimal value of 0, 0^p^ 
in the range 0 £ < 1, which gives maximum value for L(0), can be found 

using a procedure similar to that given in figure 4. The 0^jp^ can then be 
used with the sequential contextual algorithm presented in section 3.2. 

5. CONCLUSIONS 

This paper considers the problem of incorporating contextual or spatial 
information into classification. The dependencies between neighboring 
patterns are modeled through a single parameter 0, which describes the tran- 
sition probabilities of the classes of the neighboring patterns. 

Expressions are derived for updating the posteriori probabilities of the 
classes of the pattern under consideration using contextual information both 
for a spatially uniform contextual model and for sequential or Markovian 
dependencies between neighboring patterns. A likelihood function for the 
patterns in the neighborhood of the pattern under consideration, given the 
parameter 0 , is derived, and the optimal value of 0 can be obtained by 
maximizing the likelihood function. 

The techniques presented in this paper can be used for the incorporation 
of contextual information both for supervised and insupervised classifica- 
tions. Incorporation of context in unsupervised learning or clustering by 
using maximum likelihood estimates for the parameters of a mixture density 
with the component Gaussian densities is briefly described. 

Instead of using one parameter 0 in the neighborhood of the pattern under 
consideration, as shown in appendix A. transition probability models with 
different parameters can be used. The techinques, as shown in appendix B, 
can be extended for multi temporal or time-varying situations such as those 
encountered in remote sensing. 
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The procedures developed for a local neighborhood estimate of 0 can be used 
under some other modeling of transition probabilities as long as the transi- 
tion probability modeling satisfies the probability postulates. For example, 
0 can be replaced with — , where a lies between 0 and ® or with — ■ , 

where B can be between -® and ®. ^ 
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APPENDIX A 


ESTIMATION OF TRANSITION PROBABILITIES WITH DIFFERENT 
PARAMETERS IN THE LOCAL NEIGHBORHOOD 


This appendix develops some results for estimating the parameters of transi- 
tion probabilities under different models for horizontal and vertical neigh- 
bors. Let and 0y be the parameters of transition probability model for 
horizontal and vertical neighbors respectively. For the local neighborhood 
illustrated In figure 1, consider the following equation from section 4.1: 


L(5) • L(fl„. 6y) 


p(Xo, X,. .... x^le) 

n p(xj 

j»o J 


n 






n - 6y) ♦ Py 


pU • loixjjl (yi-F 

pTiT-Tot Jj * ®H p(«-ioi 

f3 


(1 - Oy)^{l - ♦ (1 - 0y)^(l - ♦ 0 - 8y)^ej0j^ 


^ * 0y)Py(i " ^ * (1 " Py)0y(i * " Py)PyP^0l6y^ 

♦ ejd . 0„)^6y ♦ ejd . 0„)e„aB„y ♦ pJeJsyH (A- 1 ) 


where 

\ p(t0 ® i ft I Xft) 

‘'h “ ^ ' pCu> ^ 

’0*1 

\ p(to * i n I 

* iQ|X2)p(u = 


A-1 





8 


V 


a 


VH 





^p(u ■ toIXg), , 


'o-' 


^ P(“ ■ <ol*o’r , 
/ . ~2 ^ I-Ptw 

6i ’’ 1“ ■ *o> 


^ p(ii) • i()|Xp) 

/ —A ^ [P(w 

p'(--<o) 


p(u * igl^O^ 
iH P^(u. • ip) 


{p( 


0) 


I'i 

E 


p(w * iolXg) 


i^ P^U-<0> 


Cp(u) 


1q|X^) •*■ p(u) • I0IX3)] 

^qI^I Jp^*^ * 

1q|X^) + p(cj • 1o|X3)][p(o) • 1 qI^ 2^ ■*■ 
iolX^)p(w * iolX 3 )][p(w » iQ|X 2 )p(u) » I 0 IX 4 )] 
1q|X 2) + p{o) * io|X4)][p(u) * iQ|X^)p(w « io|X3)] 
IqIX^) + P(u) » iQ|X3)][p(w « iQ|X2)p(o) - 


To determine 0y and 0^ that maximize equation (A>1), one takes part j1 deri- 
vations of equation (A-1) with respect to 0^ and 0^ and solves the resultinq 
equations for 0y and 0^. Taking the partial derivative of equation (A-1) with 
respect to 0^, equating the resulting expression to zero, and solving for 0y, 
one obtai. 


(1/2) 


^ N2^H ^Nl^H ^NO 

^D2®H ^D1®H * ®D0 


(A-2) 


A-2 



where 


^N1 • ^ ^■‘h * " ^VH 

So “ " ■ ‘V 

^02 “ ’ ■ S ' S * Sh ‘ ^®VH * ^V ■ '^HV * ^VH 
^D1 * S ’ ^‘VH ’ 

^00 * ^ ■ '*v ^ ®V 


Similarly, taking the partial derivative of equation (A-1) with respect to 0^, 
equating the resulting expression to zero, and solving for 6^, one obtains 


(1/2) 


S2S SiS '*' So 
^D2®V * SlS ^ So 


(A-3) 


where 


b 

b 

b 

b 

b 

b 


N2 

N1 

NO 

02 

01 

00 
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S - 

+ 

> 

CNJ 

‘VH * 

■ “^HV 

-4 

• 

+ 2jy 

■ Sh 


2 - 

s 




1 - 

s * 

6h - Uy + Oty^ - 

a0yi. + 
”11 

-2 


- 2‘^h 

* '’V • '^Vll 

* 

1 - 

“h ^ 

6h 




®VH 


Substituting for 0^ from equation (A-2) into (A-3) results in a fifth-order 
algebraic equation whose roots can be obtained by numerical methods 'refs. 10 
and 11). Let the resulting roots be O^^(i), i = 1, •••, 5. From equation 
(A-2), one then obtains the corresponding values for 6y^(i), i * 1, •••, 5. 
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Let 



(A-4) 


(A-5) 


Now the optlmiim value of "S, for 0 ^ 1 aod 0 £ £ 1 , which maximizes 

equation (A-1), can be obtained from the flow diagram given In A-1 . The 
above analysis can be general Izeo for obtaining the parameters of transition 
probabilities which have different parameters for more than two directions. 
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Figure A-1.- Procedure for finding which gives a 

global maximum of L(0). 









APPENDIX B 


MULTITEMPORAL INTERPRETATION OF CONTEXT 

This appendix gives a multi temporal interpretation of the theory developed in 
the paper for applications in remote sensing. In remote sensing, the sensor 
system usually ma’es several passes over the same ground area and acquires a 
set of data for each pass or acquisition. These passes are registered, and 
classification is performed on the registered data. It is assumed that there 
are r acquisitions. For the pixel under consideration, in each acquisition 
a data vector X^, i ■= 1, 2, ••*, r is acquired. Suppose that the acquisitions 
2, ••*, r are registered with respect to acquisition 1. In registration, 
errors are encountered. Let the classifier be trained on the data from these 
individual acquisitions, obtaining the probability density functions 
p(X|ui * i), i « 1, 2, •'•, M. The following paragraphs discuss the applica- 
tion of the theory developed in the paper in obtaining the label of the pixel 
under consideration using data X., i = 1, 2, •••, r and by minimizing the 
effect of registration errors and incorporating the context. The pixel is 
classified using the decision rule: Classify it class to = j if 

p(u) = j |X^. , • • • , X^) ^ p((o = i |X^ , • • • , X^) (B-1 ) 

i = 1 , 2 , • • • , M 

t j 

The registration errors are assumed to be modeled through the model for tran- 
sition probabilities given in equations (3) and (4). Since p(X^, •••, X^) 
is independent of i, equation (B-1) is equivalent to classifying the pixel 
as (o = j if 

P(w = j)p(X^, •••, X^|u) = j) ^ P((o = i)p(X^, •••, X^](o = ') (B-2) 

i = 1, 2, •••, M 
f j 
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From the theory developed in the paper, one has 


M M 


P,«,. •••. • jt • ^ 2 P(«,. ••• , • VI- • 
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• H 51 ••• ““>• •••• *r'-l • •••• r • 'r- ‘ t ' 'r “? ’ 'v 

V' '?•' 

MM, 1 r 

• 5z ••• II • s> “<-i • 'i'** n"' j * 'j'-i • ’i* 


■ t,i» • j> 


,•1 » -I 
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Z "“jh * • ’I'D 

)•? 

M r 

z "“jh • • '>>n 


Z • 'j'-i • 




•> Z '><*j • 'j *j ■ 'j’ * * ’i* 


’j” 


(B-3) 


The fcllowing assumptions are made in the derivation of equation (B-3). The 
density function of Xj giv?n its label identification is independent of any 
other information. The » i^, the class of X^, does not depend on the 
label of combined data, X^ , •••, X^. Because the acquisitions 2, •••, r are 
assumed to be registered with respect to the first acquisition, the transi- 
tion probabilities are assumed to obey 

P(u)j = “ ^ 1 * other w) * P(ojj = * i^) 

Using arguments similar to the ones in sections 4.1 and 4.2, one can write 
the likelihood function of X^ , X 2 , X^ given 9 as 


p(«!. « ?■ •••■ yiu) , 

n ‘"*j> n "'‘j’ 

1*1 )>i 


n "'‘j’ 

j-i 


M M 


Z •" Z -I • 'l- •••• > • 'r 


|o) 


M M 


Z ••• Z n “"jh ' 'j>! '*<■? • ’?• •••• V • ’r ‘1 • *r ’>"h • 

I,'! 1^-1 ' J.1 ) 


” I ' 

z “h • n 

I, -I }■? 


p(y, • l,|X.) 


I 


(B-4) 
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The 6 can be obtained by maximizing equation (B-4); It 
(B-3) In obtaining the label w ■ j of , •••, X^. It 
this multitemporal Interpretation can easily be coupled 
classification techniques developed In the paper. 


s used In equation 
s to be noted that 
with the contextual 
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